Search CORE

155 research outputs found

Load Balancing for Mobility-on-Demand Systems

Author: Abbeel P.
Durrant-Whyte H.
Roy Nicholas
Publication venue: 'MIT Press - Journals'
Publication date: 01/01/2012
Field of study

In this paper we develop methods for maximizing the throughput of a mobility-on-demand urban transportation system. We consider a finite group of shared vehicles, located at a set of stations. Users arrive at the stations, pick-up vehicles, and drive (or are driven) to their destination station where they drop-off the vehicle. When some origins and destinations are more popular than others, the system will inevitably become out of balance: Vehicles will build up at some stations, and become depleted at others. We propose a robotic solution to this rebalancing problem that involves empty robotic vehicles autonomously driving between stations. We develop a rebalancing policy that minimizes the number of vehicles performing rebalancing trips. To do this, we utilize a fluid model for the customers and vehicles in the system. The model takes the form of a set of nonlinear time-delay differential equations. We then show that the optimal rebalancing policy can be found as the solution to a linear program. By analyzing the dynamical system model, we show that every station reaches an equilibrium in which there are excess vehicles and no waiting customers.We use this solution to develop a real-time rebalancing policy which can operate in highly variable environments. We verify policy performance in a simulated mobility-on-demand environment with stochastic features found in real-world urban transportation networks

DSpace@MIT

High fidelity progressive reinforcement learning for agile maneuvering UAVs

Author: Abbeel P.
Faust A.
Kim H. J.
Lillicrap T. P.
Uzun S.
Yuksek B.
Zhang T.
Publication venue: 'American Institute of Aeronautics and Astronautics (AIAA)'
Publication date: 05/01/2020
Field of study

In this work, we present a high fidelity model based progressive reinforcement learning method for control system design for an agile maneuvering UAV. Our work relies on a simulation-based training and testing environment for doing software-in-the-loop (SIL), hardware-in-the-loop (HIL) and integrated flight testing within photo-realistic virtual reality (VR) environment. Through progressive learning with the high fidelity agent and environment models, the guidance and control policies build agile maneuvering based on fundamental control laws. First, we provide insight on development of high fidelity mathematical models using frequency domain system identification. These models are later used to design reinforcement learning based adaptive flight control laws allowing the vehicle to be controlled over a wide range of operating conditions covering model changes on operating conditions such as payload, voltage and damage to actuators and electronic speed controllers (ESCs). We later design outer flight guidance and control laws. Our current work and progress is summarized in this work

Crossref

Cranfield CERES

Actuator Constrained Trajectory Generation and Control for Variable-Pitch Quadrotors

Author: Abbeel P.
Baruh H.
Chowdhary G.
Cutler M.
Gavrilets V.
How J. P.
Kim J.
Kuipers J. B.
Michini B.
Publication venue: 'American Institute of Aeronautics and Astronautics (AIAA)'
Publication date: 01/01/2012
Field of study

Control and trajectory generation algorithms for a quadrotor helicopter with variable-pitch propellers are presented. The control law is not based on near-hover assumptions, allowing for large attitude deviations from hover. The trajectory generation algorithm ts a time-parametrized polynomial through any number of way points in R3, with a closed-form solution if the corresponding way point arrival times are known a priori. When time is not specifi ed, an algorithm for fi nding minimum-time paths subject to hardware actuator saturation limitations is presented. Attitude-specifi c constraints are easily embedded in the polynomial path formulation, allowing for aerobatic maneuvers to be performed using a single controller and trajectory generation algorithm. Experimental results on a variable pitch quadrotor demonstrate the control design and example trajectories.National Science Foundation (U.S.) (Graduate Research Fellowship under Grant No. 0645960

CiteSeerX

DSpace@MIT

Crossref

Dynamic Bayesian Combination of Multiple Imperfect Classifiers

Author: A.P. Dawid
A.P. Dempster
C. Fox
G. Parisi
G.J. Bierman
M. Girvan
M. West
N.M. Law
P. Abbeel
R.K. Dash
S. Geman
S. Kullback
S. Lefkimmiatis
S.M. Lee
T. Fawcett
V.C. Raykar
W.R. Gilks
Publication venue
Publication date: 08/06/2012
Field of study

Classifier combination methods need to make best use of the outputs of multiple, imperfect classifiers to enable higher accuracy classifications. In many situations, such as when human decisions need to be combined, the base decisions can vary enormously in reliability. A Bayesian approach to such uncertain combination allows us to infer the differences in performance between individuals and to incorporate any available prior knowledge about their abilities when training data is sparse. In this paper we explore Bayesian classifier combination, using the computationally efficient framework of variational Bayesian inference. We apply the approach to real data from a large citizen science project, Galaxy Zoo Supernovae, and show that our method far outperforms other established approaches to imperfect decision combination. We go on to analyse the putative community structure of the decision makers, based on their inferred decision making strategies, and show that natural groupings are formed. Finally we present a dynamic Bayesian classifier combination approach and investigate the changes in base classifier performance over time.Comment: 35 pages, 12 figure

arXiv.org e-Print Archive

Crossref

Explore Bristol Research

Comparison of Fixed and Variable Pitch Actuators for Agile Quadrotors

Author: Abbeel P.
Borenstein J.
Bresciani T.
Chen H.
Drela M.
Drela M.
d’Ambrosio G.
Finally
Franklin G.
Gavrilets V.
Kim J.
Publication venue: 'American Institute of Aeronautics and Astronautics (AIAA)'
Publication date: 01/08/2011
Field of study

This paper presents the design, analysis and experimental testing of a variable- pitch quadrotor. A custom in-lab built quadrotor with on-board attitude stabi- lization is developed and tested. An analysis of the dynamic di erences in thrust output between a xed-pitch and variable-pitch propeller is given and validated with simulation and experimental results. It is shown that variable-pitch actuation has signi cant advantages over the conventional xed-pitch con guration, includ- ing increased thrust rate of change, decreased control saturation, and the ability to quickly and e ciently reverse thrust. These advantages result in improved quadro-tor tracking of linear and angular acceleration command inputs in both simulation and hardware testing. The bene ts should enable more aggressive and aerobatic ying with the variable-pitch quadrotor than with standard xed-pitch actuation, while retaining much of the mechanical simplicity and robustness of the xed-pitch quadrotor.Aurora Flight Sciences Corp.National Science Foundation (U.S.) (Graduate Research Fellowship Grant 0645960

DSpace@MIT

Crossref

Deep active learning for autonomous navigation.

Author: AY Ng
BD Argall
D Silver
I Noda
J Kober
P Abbeel
R Bemelmans
S Calinon
S Karakovskiy
S Schaal
V Mnih
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 19/08/2016
Field of study

Imitation learning refers to an agent's ability to mimic a desired behavior by learning from observations. A major challenge facing learning from demonstrations is to represent the demonstrations in a manner that is adequate for learning and efficient for real time decisions. Creating feature representations is especially challenging when extracted from high dimensional visual data. In this paper, we present a method for imitation learning from raw visual data. The proposed method is applied to a popular imitation learning domain that is relevant to a variety of real life applications; namely navigation. To create a training set, a teacher uses an optimal policy to perform a navigation task, and the actions taken are recorded along with visual footage from the first person perspective. Features are automatically extracted and used to learn a policy that mimics the teacher via a deep convolutional neural network. A trained agent can then predict an action to perform based on the scene it finds itself in. This method is generic, and the network is trained without knowledge of the task, targets or environment in which it is acting. Another common challenge in imitation learning is generalizing a policy over unseen situation in training data. To address this challenge, the learned policy is subsequently improved by employing active learning. While the agent is executing a task, it can query the teacher for the correct action to take in situations where it has low confidence. The active samples are added to the training set and used to update the initial policy. The proposed approach is demonstrated on 4 different tasks in a 3D simulated environment. The experiments show that an agent can effectively perform imitation learning from raw visual data for navigation tasks and that active learning can significantly improve the initial policy using a small number of samples. The simulated test bed facilitates reproduction of these results and comparison with other approaches

Crossref

Open Access Institutional Repository at Robert Gordon University

Embodied imitation-enhanced reinforcement learning in multi-agent systems

Author: Abbeel P.
Barto A. G.
Bentivegna D. C.
Bradtke S. J.
Crites R. H.
Erbas M. D.
Latzke T.
Mondada F.
Nehaniv C. L.
Nolfi S.
Smart W. D.
Strosslin T.
Sutton R. S.
Watkins C.
Publication venue: 'SAGE Publications'
Publication date: 01/02/2014
Field of study

Imitation is an example of social learning in which an individual observes and copies another's actions. This paper presents a new method for using imitation as a way of enhancing the learning speed of individual agents that employ a well-known reinforcement learning algorithm, namely Q-learning. Compared with other research that uses imitation with reinforcement learning, our method uses imitation of purely observed behaviours to enhance learning, with no internal state access or sharing of experiences between agents. The paper evaluates our imitation-enhanced reinforcement learning approach in both simulation and with real robots in continuous space. Both simulation and real robot experimental results show that the learning speed of the group is improved. © The Author(s) 2013

Crossref

UWE Bristol Research Repository

The field high-amplitude SX Phe variable BL Cam: results from a multisite photometric campaign. II. Evidence of a binary - possibly triple - system

Author: A. Avdibegovic
A. Nava-Vega
A.-Y. Zhou
B. H. Granslo
B. Yasarsoy
Breger
C. Ulusoy
C. Yesilyaprak
Carney
Chandler
Claret
E. G. Hintz
E. Rodríguez
Eddington
F. Fumagalli
F. J. Aceituno
F. Van Den Abbeel
Fauvaud
Ferraro
Fu
Fu
Fu
G. Klingenberg
G. Santacana
Grether
H. Kučáková
H. Wücher
Harrington
I. Ribas
I. Scheggia
Irwin
J. A. Farrell
J. H. Simonetti
J. Kliner
J. Michelet
J.-J. Rives
J.-M. Vugnon
J.-P. Sareyan
Jeon
Jeon
K. A. Graham
K. Truparová
Kim
Kürster
L. Fox-Machado
L. Kotková
L. Král
Lenz
M. Alvarez
M. Blaĭek
M. Fauvaud
M. Helvaci
M. J. López-González
M. P. Nicholson
M. Vilášek
M. Wolf
Mathieu
Mazur
McNamara
McNamara
McNamara
O. Trondal
P. Lampens
P. Van Cauteren
P. Zasche
Perets
Preston
R. Behrend
R. Kocián
R. Michel
Ribas
Rodríguez
Rodríguez
Rodríguez
S. Bartošíková
S. Fauvaud
Sandage
Seidelmann
Szebehely
Templeton
V. M. Casanova
Zhou
Publication venue: 'EDP Sciences'
Publication date: 01/01/2010
Field of study

Short-period high-amplitude pulsating stars of Population I (

\delta

Sct stars) and II (SX Phe variables) exist in the lower part of the classical (Cepheid) instability strip. Most of them have very simple pulsational behaviours, only one or two radial modes being excited. Nevertheless, BL Cam is a unique object among them, being an extreme metal-deficient field high-amplitude SX Phe variable with a large number of frequencies. Based on a frequency analysis, a pulsational interpretation was previously given. aims heading (mandatory) We attempt to interpret the long-term behaviour of the residuals that were not taken into account in the previous Observed-Calculated (O-C) short-term analyses. methods heading (mandatory) An investigation of the O-C times has been carried out, using a data set based on the previous published times of light maxima, largely enriched by those obtained during an intensive multisite photometric campaign of BL Cam lasting several months. results heading (mandatory) In addition to a positive (161

\pm

3) x 10

^{-9}

^{-1}

secular relative increase in the main pulsation period of BL Cam, we detected in the O-C data short- (144.2 d) and long-term (

\sim

3400 d) variations, both incompatible with a scenario of stellar evolution. conclusions heading (mandatory) Interpreted as a light travel-time effect, the short-term O-C variation is indicative of a massive stellar component (0.46 to 1 M_{\sun}) with a short period orbit (144.2 d), within a distance of 0.7 AU from the primary. More observations are needed to confirm the long-term O-C variations: if they were also to be caused by a light travel-time effect, they could be interpreted in terms of a third component, in this case probably a brown dwarf star (

\geq

0.03 \ M_{\sun}), orbiting in

\sim

3400 d at a distance of 4.5 AU from the primary.Comment: 7 pages, 5 figures, accepted for publication in A&

arXiv.org e-Print Archive

Crossref

EDP Sciences OAI-PMH repository (1.2.0)

Red Mexicana de Repositorios Institucionales

HAL-INSU

Ege University Institutional Repository

HAL-OBSPM

Hal-Diderot

Deep imitation learning for 3D navigation tasks

Author: A Hussein
A Robins
Ahmed Hussein
AY Ng
Chrisina Jayne
D Silver
Eyad Elyan
M Fiasché
Mohamed Medhat Gaber
N Kasabov
N Ratliff
P Abbeel
S Calinon
S Ikemoto
S Schliebs
S Schliebs
V Mnih
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Deep learning techniques have shown success in learning from raw high dimensional data in various applications. While deep reinforcement learning is recently gaining popularity as a method to train intelligent agents, utilizing deep learning in imitation learning has been scarcely explored. Imitation learning can be an efficient method to teach intelligent agents by providing a set of demonstrations to learn from. However, generalizing to situations that are not represented in the demonstrations can be challenging, especially in 3D environments. In this paper, we propose a deep imitation learning method to learn navigation tasks from demonstrations in a 3D environment. The supervised policy is refined using active learning in order to generalize to unseen situations. This approach is compared to two popular deep reinforcement learning techniques: Deep-Q-networks (DQN) and Asynchronous actor critic (A3C). The proposed method as well as the reinforcement learning methods employ deep convolutional neural networks and learn directly from raw visual input. Methods for combining learning from demonstrations and experience are also investigated. This combination aims to join the generalization ability of learning by experience with the efficiency of learning by imitation. The proposed methods are evaluated on 4 navigation tasks in a 3D simulated environment. Navigation tasks are a typical problem that is relevant to many real applications. They pose the challenge of requiring demonstrations of long trajectories to reach the target and only providing delayed rewards (usually terminal) to the agent. The experiments show that the proposed method can successfully learn navigation tasks from raw visual input while learning from experience methods fail to learn an e�ective policy. Moreover, it is shown that active learning can significantly improve the performance of the initially learned policy using a small number of active samples

Crossref

Birmingham City University Open Access Repository

Open Access Institutional Repository at Robert Gordon University

Teeside University's Research Repository

BCU Open Access

Oxford Brookes University: RADAR

Ground Delay Program Analytics with Behavioral Cloning and Inverse Reinforcement Learning

Author: Abbeel P.
Ball M.
Bertsekas D. P.
Bloem M.
Buxi G.
Chawla N. V.
Cook L. S.
Cunningham J.
Dhal R.
Grabbe S.
Jeschkies K.
Kim A.
Kulkarni D.
Liu Y.
Mukherjee A.
Ng A.
Pedregosa F.
Provan C. A.
Ratliff N.
Ratliff N. D.
Rios J.
Smith D. A.
Sridhar B.
Wang Y.
Wang Y.
Wolfe S. R.
Publication venue
Publication date: 12/06/2014
Field of study

We used historical data to build two types of model that predict Ground Delay Program implementation decisions and also produce insights into how and why those decisions are made. More specifically, we built behavioral cloning and inverse reinforcement learning models that predict hourly Ground Delay Program implementation at Newark Liberty International and San Francisco International airports. Data available to the models include actual and scheduled air traffic metrics and observed and forecasted weather conditions. We found that the random forest behavioral cloning models we developed are substantially better at predicting hourly Ground Delay Program implementation for these airports than the inverse reinforcement learning models we developed. However, all of the models struggle to predict the initialization and cancellation of Ground Delay Programs. We also investigated the structure of the models in order to gain insights into Ground Delay Program implementation decision making. Notably, characteristics of both types of model suggest that GDP implementation decisions are more tactical than strategic: they are made primarily based on conditions now or conditions anticipated in only the next couple of hours

Crossref

NASA Technical Reports Server

Scipedia